# Adversarial learning
Vits Vctk
MIT
VITS is an end-to-end speech synthesis model capable of predicting corresponding speech waveforms from input text sequences. The model employs a conditional variational autoencoder (VAE) architecture, including a posterior encoder, decoder, and conditional prior module.
Speech Synthesis
Transformers

V
kakao-enterprise
3,601
13
Vits2 Ru Natasha
MIT
Russian text-to-speech model based on VITS2 architecture, trained with Natasha dataset, providing efficient and natural speech synthesis capabilities.
Speech Synthesis
Transformers Other

V
frappuccino
53
7
Featured Recommended AI Models